Conversation
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
Signed-off-by: Junpu Fan <junpu@amazon.com>
sirutBuasai
left a comment
There was a problem hiding this comment.
We'll most likely need cuda compat script as well. not sure how vllm has been working thus far. https://github.com/aws/deep-learning-containers/blob/eb524f7c0737b007cf06d4fd36f67de246cc8d8f/sglang/build_artifacts/start_cuda_compat.sh
Not running any SageMaker test yet |
| - name: Download image URI artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: vllm-rayserve-ec2-image-uri | ||
|
|
||
| - name: Resolve image URI for test | ||
| run: | | ||
| IMAGE_URI=$(cat image_uri.txt) | ||
| echo "Resolved image URI: $IMAGE_URI" | ||
| echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV | ||
| - name: Pull image | ||
| run: | | ||
| docker pull $IMAGE_URI | ||
| - name: Checkout vLLM Tests | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| repository: vllm-project/vllm | ||
| ref: v0.10.2 | ||
| path: vllm_source |
There was a problem hiding this comment.
I wonder if there's a way to DRY these steps. These are going to be used repeatedly across multiple stages
There was a problem hiding this comment.
later we can refactor common patterns into callable workflows or other things
| parser = argparse.ArgumentParser() | ||
| parser.add_argument( | ||
| "--framework", | ||
| choices=["tensorflow", "mxnet", "pytorch", "base", "vllm"], |
There was a problem hiding this comment.
We can probably fix this list to ["tensorflow", "pytorch", "base", "vllm", "sglang"]
There was a problem hiding this comment.
actually this telemetry things doesn't work, cuz the template replacement stuff doesn't exist. will need to fix it separately
| ) | ||
| parser.add_argument( | ||
| "--container-type", | ||
| choices=["training", "inference", "general"], |
There was a problem hiding this comment.
Also side note and unrelated to this PR. Currently vllm and sglang are classified as general. We should change these to inference.
There was a problem hiding this comment.
the overall telemetry integration needs to be fixed.
a sample PR build and test workflow